Statistical Translation, Heat Kernels and Expected Distances

نویسندگان

  • Joshua V. Dillon
  • Yi Mao
  • Guy Lebanon
  • Jian Zhang
چکیده

High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Information Diffusion Kernels

A new family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. Based on the heat equation on the Riemannian manifold defined by the Fisher information metric, information diffusion kernels generalize the Gaussian kernel of Euclidean space, and provide a natural way of combining generative statistical modeling with non-parametric discr...

متن کامل

Accurate and Efficient Computation of Laplacian Spectral Distances and Kernels

This paper introduces the Laplacian spectral distances, as a function that resembles the usual distance map, but exhibits properties (e.g., smoothness, locality, invariance to shape transformations) that make them useful to processing and analyzing geometric data. Spectral distances are easily defined through a filtering of the Laplacian eigenpairs and reduce to the heat diffusion, wave, bi-har...

متن کامل

Diffusion Kernels on Statistical Manifolds

A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomia...

متن کامل

Expected Sequence Similarity Maximization

This paper presents efficient algorithms for expected similarity maximization, which coincides with minimum Bayes decoding for a similarity-based loss function. Our algorithms are designed for similarity functions that are sequence kernels in a general class of positive definite symmetric kernels. We discuss both a general algorithm and a more efficient algorithm applicable in a common unambigu...

متن کامل

Evolution of dispersal distance: maternal investment leads to bimodal dispersal kernels.

Since dispersal research has mainly focused on the evolutionary dynamics of dispersal rates, it remains unclear what shape evolutionarily stable dispersal kernels have. Yet, detailed knowledge about dispersal kernels, quantifying the statistical distribution of dispersal distances, is of pivotal importance for understanding biogeographic diversity, predicting species invasions, and explaining r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007